Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT: Derived memh #10332

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open

UCT: Derived memh #10332

wants to merge 23 commits into from

Conversation

iyastreb
Copy link
Contributor

@iyastreb iyastreb commented Nov 26, 2024

What?

This PR is a composite part of the new memory invalidation design based on the concept of memory windows.
Here we introduce changes on the UCT side.
On UCT side memory window represents a shallow copy of some existing UCT memory handle, which can be used to access the same memory region. When created, the memory window inherits the original memh access flags and state, and takes ownership of the indirect keys of the original memory handle. The lifetime of the memory window is bound to the original memh, and the original memh cannot be destroyed until all its memory windows are destroyed.

Design doc

Why?

The overall idea is to employ a lightweight invalidation: when failure happens we invalidate only the exposed indirect rkeys, but don't deregister the entire memory region, which is an expensive operation. Lightweight invalidation enables invalidation workflow for RNDV pipeline protocol.

How?

  1. Introduced a new option for uct_md_mem_reg_params_t: UCT_MD_MEM_WINDOW. Existing mem_reg call will create a memory window based on existing UCT memh
  2. When memory window is requested, create a shallow copy of base UCT memh and take the ownership of its indirect memory keys.

Comment on lines 886 to 889
base->atomic_dvmr = NULL;
base->atomic_rkey = UCT_IB_INVALID_MKEY;
base->indirect_dvmr = NULL;
base->indirect_rkey = UCT_IB_INVALID_MKEY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to invalidate basic memh keys? I'd expect that this will break (at least performance) of some (at least atomics) operations on the base key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should not hinder performance, because base memh rkeys will be recreated on next mkey_pack, like is done in my test.
But you are right, strictly speaking it's NOT necessary to transfer rkeys ownership to the derived memh. It will work even without this transfer. I will remove this ownership transfer logic

The reason why I added this transfer is the following:
Currently it's UCT layer that decides whether to create rkeys (atomic + indirect), and conditions for them are different:

  • atomic rkey is created if user requested it (with UCT_MD_MEM_ACCESS_REMOTE_ATOMIC), and hardware supports it. So it might be created no matter if invalidation is needed
  • indirect rkey is created only when invalidation is needed
    Now from UCP layer we assume that we will create a memory window when invalidation if needed from EP config. By this time atomic rkeys on base memh might be already created and I thought it's a good idea to move it to the memory window, to avoid re-creating this key per MW.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed ownership transfer for now, let's consider this later


/* Test case 4: base memh can still be used to pack mkeys */
std::vector<uint8_t> base_rkey2 = mkey_pack(base, flags);
EXPECT_NE(base_rkey1, base_rkey2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not think it is correct. base memh should not be modified once derived memh is created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed this

uct_mem_h mw1 = reg_mem_window(base);

/* Test case 2: creating MW from memh after mkey_pack */
std::vector<uint8_t> base_rkey1 = mkey_pack(base, flags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need invalidation on first mkey pack of base?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need test with packing without invalidation first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of UCT invalidation would mean that we destroy derived memh. Currently I don't have tests with invalidation, actually it's a good point to add them.
Or I misunderstood your comment

@iyastreb iyastreb mentioned this pull request Jan 9, 2025
rakhmets
rakhmets previously approved these changes Jan 31, 2025
@@ -372,9 +372,17 @@ static ucs_status_t
uct_cuda_ipc_mem_reg(uct_md_h md, void *address, size_t length,
const uct_md_mem_reg_params_t *params, uct_mem_h *memh_p)
{
uct_mem_h base = (params != NULL) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params is not checked to be non-NULL in other places. I think we can remove this check here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, initially I didn't have this check, added it to fix NULL pointer crash in CI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember the failed test? Looks like an issue in caller function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember it, but I just see my commit named "Fix NPE" that I've made to address that failure: 53a4b08

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, found and fixed this NULL pointer error in gtest: b193dea

* A pointer to an existing memory handle.
* Used to register a derived memh: a shallow copy of an existing UCT memh
* which can be used to access the same memory region. When created, the
* derived memh inherits the access flags and the state of the original
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we modify the access flags when creating a memory window? i would expect yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware of this use case.
Do you propose to update this documentation? Or extend derived memh API so that we can create it with alternate access rights?

@@ -375,6 +375,10 @@ uct_cuda_ipc_mem_reg(uct_md_h md, void *address, size_t length,
uct_cuda_ipc_memh_t *memh;
CUdevice cu_device;

UCT_CHECK_PARAM((params == NULL) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO no need to check (params == NULL)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment I got from Raul)
I added this check intentionally, because some unit test was triggering this call with NULL param.
Ok, let me remove it, so we see exactly that test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, found and fixed this NULL pointer error in gtest: b193dea

@@ -375,6 +375,10 @@ uct_cuda_ipc_mem_reg(uct_md_h md, void *address, size_t length,
uct_cuda_ipc_memh_t *memh;
CUdevice cu_device;

UCT_CHECK_PARAM((params == NULL) ||
UCT_MD_MEM_REG_FIELD_VALUE(params, memh, FIELD_MEMH, NULL) == NULL,
"CUDA IPC does not support derived memory handles");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this a common macro and use in other memory domains that dont support deriver memh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can
I think we can use this new macro only in those MDs which advertise INVALIDATION support, but do not properly handle it. These are namely: cuda_ipc_md and cma_md

unsigned mem_flags, size_t memh_base_size,
size_t mr_size, uct_ib_mem_t **memh_p)
static uct_ib_mem_t *
uct_ib_memh_alloc_internal(uct_ib_md_t *md, size_t memh_base_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make better names, instead of uct_ib_memh_alloc_internal+uct_ib_memh_alloc - uct_ib_memh_alloc+uct_ib_memh_new/init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uct_ib_memh_alloc is an existing function, so we don't want to change its name
I just extracted some common part from uct_ib_memh_alloc into _internal so that it can be reused by uct_ib_memh_clone.

Maybe we name it uct_ib_memh_alloc_common?

Comment on lines 651 to 656
memh = uct_ib_memh_alloc_internal(md, memh_base_size, mr_size, &memh_size);
if (memh == NULL) {
return UCS_ERR_NO_MEMORY;
}

memcpy(memh, src, memh_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. clone seems wrong - if the derived memh is shallow, why need to fully copy the original memh?
  2. seems weird that we use calloc() and then override everything with memcpy

Copy link
Contributor Author

@iyastreb iyastreb Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering in the opposite order:
2. Right, here malloc would be enough but I'm reusing existing function uct_ib_memh_alloc_internal that does calloc and calculates the size, just to reuse the common code. I think the overhead is minimal.
We could split size calculation into a separate function etc, but I think it's not worth the effort

  1. Ok, maybe it's a terminology issue here
    My understanding is:
    Deep copy creates an independent instance of an object, that can be used apart from the original.
    Shallow copy creates an "alias" that depends on master copy and cannot be used separately.
    The latter is what's implemented in this PR, whatever we name it.

Derived memh is a shallow copy, because it makes a shallow copy of MRs - the most important part of original memh. And original object remains the only owner of the MRs state.
Of course there could be different implementations of derived memh. For example, we can imagine a shallow copy looking like that:

struct {
   uct_ib_mlx5_devx_mem_t *base;
   // derived specific fields
} uct_derived_memh;

Looks ok, right? Still you need to allocate this object.
What are the issues with this approach:

  • There are places where we assume that memh has always the uct_ib_mem_t base: uct_rc_mlx5_txqp_tag_inline_post: ((uct_ib_mem_t*)iov->memh)
    So we must add uct_ib_mem_t super field to our copy
  • The copy must handle a set of rkeys independently from the original memh, meaning that the following fields also needs to be copied:
    struct mlx5dv_devx_obj      *atomic_dvmr;
    struct mlx5dv_devx_obj      *indirect_dvmr;
    uint32_t                    atomic_rkey;
    uint32_t                    indirect_rkey;

This is just to show you that we need to duplicate the significant part of the original memh anyway.

  • Then in each and every place we need to check whether passed memh is derived or original (because their layout is different), adding a lot of boilerplate code and CPU overhead..
  • Then we need to modify quite some existing functions for key generation, because they cache keys in the original object.

To overcome all these issues I make a shallow copy of the original memh (== they have the same memory layout), and use it interchangeably in all the places. This allows me to minimize the amount of ifs in the code, basically we check for derived handle only during init and cleanup. This helps to avoid any refactoring in the existing functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we nee to brainstorm the approaches. I don't like having structs where only some fields used, maybe need a deeper refactoring in the IB memh structure.

/**
* Memory domain supports derived memory handle registration.
*/
UCT_MD_FLAG_DERIVED = UCS_BIT(13)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UCT_MD_FLAG_REG_DERIVED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const uct_ib_mlx5_devx_mem_t *src,
uct_ib_mlx5_devx_mem_t **memh_p)
{
size_t mr_size = src->super.flags & UCT_IB_MEM_IMPORTED ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@iyastreb iyastreb changed the title UCT: Memory window UCT: Derived memh Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants